Scalable data logging

ABSTRACT

A scalable data collection and logging system can collect and persist data from a high scale network environment with less strain on memory resources. The system includes data collectors which collect data from network devices and store the data in device-specific logs in memory. The system also includes log monitors which periodically offload and compress the logs in memory and append the logs to device specific log files in storage, thereby freeing up memory space and persisting the log data for future analysis. A log monitor manager load balances offloading operations across available log monitors and can instantiate additional log monitors to scale the offloading operations as a network grows. Additionally, another process can monitor log files in storage and truncate them as needed to maintain an amount storage space consumed by the log files.

BACKGROUND

The disclosure generally relates to the field of data processing, andmore particularly to network data collection.

For management of devices in a network, a system manager collects datafrom the devices. Data collection can be done according to thewell-defined Simple Network Management Protocol (SNMP). The systemmanager or network manager can send GET requests to SNMP-enabled deviceswith specified object identifiers (OIDs) to collect a device definition,a device attribute, a 2 dimensional array of managed objects, etc.SNMP-enabled devices provide responses with the values corresponding tothe requested OIDs. This request-response exchange may be referred to asSNMP data collection. Similar data collection operations may beperformed for non-SNMP devices using device specific protocols orapplication programming interfaces. The collected data may be stored inlogs which can be used for network performance monitoring or root causeanalysis.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects of the disclosure may be better understood by referencing theaccompanying drawings.

FIG. 1 depicts an example illustration of a data collection and loggingsystem with data collectors and scalable log monitors that persistcollected log data in storage.

FIG. 2 depicts an example system for assignment of logs to instantiatedlog monitors.

FIG. 3 is a flowchart of example operations for managing log monitors.

FIG. 4 is a flowchart of example operations for offloading logs inmemory to a storage device.

FIG. 5 is a flowchart of example operations for monitoring log files inpersistent storage.

FIG. 6 depicts an example computer system with a scalable data loggingapplication.

DESCRIPTION

The description that follows includes example systems, methods,techniques, and program flows that embody aspects of the disclosure.However, it is understood that this disclosure may be practiced withoutthese specific details. For instance, this disclosure refers tooffloading data collected from devices in a network in illustrativeexamples. Aspects of this disclosure can be also applied to data storagesystems which buffer data in memory prior to transferring the data topersistent storage. In other instances, well-known instructioninstances, protocols, structures, and techniques have not been shown indetail in order not to obfuscate the description.

Introduction

In a high scale network environment, a network may include thousands ofdevices. Collecting network management data from these devices can taxthe resources of data collection systems. For example, if a networkincludes 5,000 devices, collecting and storing a 2 megabyte log for eachdevice requires 10 gigabytes of memory just for the logs alone.Insufficient memory can cause data loss as log data is overwritten andcan cause poor performance or failure of data collection systems. Whilesome data collection systems may dump log data from memory into a logfile, the aggregation of log data from thousands of devices into a fewlog files can create issues with file operations and datamanagement/analysis.

Overview

A scalable data collection and logging system can collect and persistdata from a high scale network environment with less strain on memoryresources. The system includes data collectors which collect data fromnetwork devices and store the data in device-specific logs in memory.The system also includes log monitors which periodically offload andcompress the logs in memory and append the logs to device specific logfiles in storage, thereby freeing up memory space and persisting the logdata for future analysis. A log monitor manager load balances offloadingoperations across available log monitors and can instantiate additionallog monitors to scale the offloading operations as a network grows.Additionally, another process can monitor log files in storage andtruncate them as needed to maintain an amount storage space consumed bythe log files.

Example Illustrations

FIG. 1 depicts an example illustration of a data collection and loggingsystem with data collectors and scalable log monitors that persistcollected log data in storage. FIG. 1 depicts a data collection andlogging system 110 which includes a data collector manager 130, a logmonitor manager 140, a log file monitor 146, and a reporting engine 150.The data collection and logging system 110 utilizes a memory 135 and astorage device 145. The storage device 145 may be a hard disk, flashstorage, storage cluster, cloud storage, etc. The data collector manager130 manages a data collector 131, a data collector 132, and a datacollector 133 which collect data through a network 105 from networkeddevices 118. The network devices 118 include devices such as servers,routers, switches, etc., which may be a combination of SNMP-compliantdevices, non-SNMP compliant devices, etc. The data collectors 131, 132,and 133 are agents, daemons, or services instantiated by the datacollector manager 130 that run on nodes of the data collection andlogging system 110 (e.g., computing devices that may be consideredservers). Similarly, log monitor 141 and log monitor 142 are agents,daemons, or services instantiated by the log monitor manager 140 thatrun on nodes of the data collection and logging system 110 and monitorlogs of collected data in the memory 135.

At stage A, the data collectors 131, 132, and 133 (“data collectors”)collect data from the network devices 118 through the network 105. Thedata collectors may communicate with the network devices 118 through aphysical or wireless connection to the network 105, which may be a localarea network, a wide area network, or a combination of the foregoing.The data collectors may communicate with the network devices 118 by wayof communication protocols (e.g., Transmission Control Protocol/InternetProtocol (TCP/IP), User Datagram Protocol (UDP), file transfer protocol(FTP)) to collect data related to operating conditions, performancemetrics, etc., of the network devices 118. This data collection can bedone periodically according to the Simple Network Management Protocol(SNMP), device specific plug-ins/scripts, application program interfaces(APIs), etc. The collected data may be used to determine metrics such asdevice or network health and performance (e.g., availability,throughput, bandwidth utilization, latency, error rates, and processorutilization). The data collectors may also process the collected data byconverting a file format containing the collected data or by normalizingthe data and/or data structure to conform to the current networkmanagement system and/or ease of processing or uniformity. For example,collected data may be in a variety of formats (e.g., XML, CSV,JavaScript Object Notation (JSON), etc.), different data structures(e.g., array, record, graph, etc.) and/or different numeric systems(e.g., Metric, English) and is converted from one numeric system toanother.

In FIG. 1, each of the data collectors is depicted as collecting datafor a single Internet Protocol (IP) address assigned to a device in thenetwork devices 118. The data collector 131 collects data for a devicecorresponding to the IP address 192.168.1.1, the data collector 132collects for 192.168.1.2, and the data collector 133 collects for192.168.1.3. In some implementations, each data collector may collectdata for hundreds or thousands of devices. The data collector manager130 assigns devices from the network devices 118 to the data collectorsand load balances the data collection assignments across the availabledata collectors. The data collector manager 130 can instantiateadditional data collectors as devices are added to the network devices118 or as additional resources for data collection become available. Forexample, if another compute node is made available to the datacollection and logging system 110, the data collector manager 130 mayinstantiate another data collector to execute on the node andredistribute collection assignments across the data collectors.

At stage B, the data collectors write the collected data to logs in thememory 135. The data collector manager 130 may allocate memory for adata collector when the data collector is instantiated, or a datacollector may be programmed to allocate memory for a log for each devicefrom which data is collected. A log is a collection of log entries andmay be a data structure such as an array, linked list, circular buffer,etc. Each entry in a log may be a data object or a text that includesthe collected data, calculated metrics, etc. Each entry may be separatedby a new line character, comma, semicolon, etc. In FIG. 1, the logentries are numbered for ease of explanation and do not depict typicallog entry data. A data collector writes a new entry to the correspondinglog each time data is collected from the device corresponding to the IPaddress or log. For example, in FIG. 1, the data collector 131 haswritten a total of fifty entries to the log for the IP address192.168.1.1 indicating that data has been collected fifty times from thecorresponding network device. The data collector 132 has written a totalof fifty-eight entries, and the data collector 133 has written tenentries. The data collectors may collect data at different rates,collect data from specified network devices more frequently, or may beconfigured to only log collected data when certain conditions are met.For example, a data collector may only log collected data if a specifiedmetric has changed in value since the previously collected metric value.As a result, the size and growth rate of the logs may differ.

At stage C, the log monitor manager 140 instantiates the log monitor 141and the log monitor 142 to monitor logs in the memory 135. As additionaldevice logs are created in the memory 135, the log monitor manager 140instantiates log monitors to be responsible for offloading log entriesfrom the memory 135 to log files in the storage device 145. In FIG. 1,the log monitor manager 140 has instantiated the log monitor 141 tomonitor the logs for the IP addresses 192.168.1.1 and 192.168.1.2 andhas instantiated the log monitor 142 to monitor the log for the IPaddress 192.168.1.3. The log monitor manager 140 assigns and loadbalances the logs across available log monitors. In the example of FIG.1, if an additional log is created in the memory 135, the log monitormanager 140 will assign the new log to the log monitor 142 as the logmonitor 142 is currently assigned a single log, whereas the log monitor141 is assigned two logs. The log monitor manager 140 may use differentheuristics for load balancing assignments across available log monitorsand for determining whether to instantiate additional log monitors. Thelog monitor manager 140 may use a formula that determines logassignments based on IP addresses, as described in more detail in FIG.2. The log manager 140 may monitor a number of logs assigned to each logmonitor and assign new logs to the log monitor with a least number ofassigned logs. Additionally, if the number of logs assigned to each logmonitor exceeds a threshold, the log monitor manager 140 may determinethat an additional log monitor should be instantiated. The number of logmonitors instantiated by the log monitor manager 140 can vary based on atotal number of logs in the memory 135, available resources such asmemory and processing power, and desired performance. Some systems mayhave a limit on a number of simultaneous file access operations. Forexample, some operating systems may limit simultaneous file accessoperations to one thousand simultaneous operations. Since the logmonitors access files to offload log data, the upper bound on a totalnumber of log monitors is equal to the simultaneous file access limit.

At stage D, the log monitor 141 selects the log for the IP address192.168.1.1 for offloading to the storage device 145. The log monitor141 iterates through its assigned logs and periodically offloads eachlog to a corresponding log file on the storage device 145 to prevent anexcess amount of log data from accumulating in the memory 135. Forexample, the log monitor 141 may offload the log for 192.168.1.2 andthen ten seconds later begin offloading the log for 192.168.1.1. In analternative example, the log monitor 141 may alternate offloading logentries for 192.168.1.1 and 192.168.1.2 with no time in between. In someimplementations, the log monitor 141 monitors sizes of its assigned logsand offloads a log when the log size satisfies a threshold. For example,each log may have a limit of 2 megabytes, so the log monitor 141offloads a log when the log is close to or at the 2 megabyte limit. Theamount of memory allocated for a log can be configured or can be dynamicbased on an amount of available memory or a total number of logs. Thelog monitor 141 offloads the entries in the log 192.168.1.1 by readingthe entries 30-50 from the memory 135 and then clearing or freeing upthe space occupied by the logs in the memory 135. The log monitor 141may also adjust metadata or header information for the log. For example,the log monitor 141 may update header information that indicates anumber of entries in the log to 0. In some implementations, the logmonitor 141 may not offload all available entries. For example, the logmonitor 141 may be configure to offload 20 entries at a time and to notoffload any entries from a log unless there are at least 20 entriesavailable to be offloaded.

At stage E, the log monitor 141 compresses and appends the offloaded logentries for the IP address 192.168.1.1 to a log file on the storagedevice 145. The log monitor 141 invokes the compression tool 144 tocompress the offloaded entries. The compression tool 144 can use avariety of compression techniques such as gzip, Roshal Archive (RAR),zip, etc. After the log entries are compressed, the log monitor 141identifies and accesses the log file for 192.168.1.1. In instances wherea log file is not already created, the log monitor 141 creates a logfile for the IP address. The log monitor 141 then appends the compressedlog entries to the existing data in the log file. Each log filecomprises chunks of compressed data which include the offloaded logentries. The log monitor 141 may also modify header or metadatainformation for the log file to indicate a number of compressed chunks,indicate a number of entries in the compressed data, update a totalnumber of log entries in the log file, etc.

At stage F, the log file monitor 146 maintains the logs in the storagedevice 145. The log file monitor 146 is configured to track sizes of thelogs in the storage device 145 and keep each log below a threshold size.For example, the log file monitor 146 may ensure that each log is nolarger than 100 megabytes. If a log reaches or exceeds a threshold size,the log file monitor 146 truncates the log by removing old log entriesso that the log is again below the threshold size. Because the log filescomprise multiple compressed chunks of data, the log file monitor 146reads the file into memory, such as the memory 135, and decompresseseach of the chunks. The log file monitor 146 then estimates a sizereduction of the decompressed log file that will place the log file incompliance with the size threshold after being recompressed. Forexample, a compressed log file may be 7 megabytes which exceeds a sizethreshold of 5 megabytes. After the log file monitor 146 decompressesthe log file, the log file may be 14 megabytes in size. The log filemonitor 146 can then estimate that removing 6 megabytes from thedecompressed log file will result in an approximately 4 megabyte logfile after recompression, thereby satisfying the threshold. The storagemonitor's 146 estimate is adjusted based on a size of the compressed logfile, a size of the decompressed log file, the compression techniqueutilized, a compression ratio, etc. For example, compression techniquesare generally less effective on small file sizes, so the log filemonitor 146 may remove more decompressed data from a smaller log filethan is removed from a larger file. Additionally, less data will removedfrom log files for which a more efficient compression technique is used.

In addition to removing enough data to satisfy a size threshold, the logfile monitor 146 removes enough data to leave a buffer for additionalentries to be added to the log. For example, if a threshold size for alog file is 100 megabytes and a log file is 110 megabytes, the log filemonitor 146 may remove enough log entries to reduce the log size to 80megabytes, which places the log below the threshold and leaves 20megabytes of buffer for new entries. The log file monitor 146 may adjusta size of the buffer based on available system resources, how quicklylog entries are being offloaded to the storage device 145, etc. Forexample, if the log file monitor 146 has truncated a log a thresholdnumber of times within a time period, the log file monitor 146 mayincrease the buffer size in order to decrease the frequency with whichthe log is truncated.

To avoid leaving partial log entries in a log file, the log file monitor146 performs calculations to determine a number of complete log entriesto remove from a log file based on an overall size reduction determinedabove. For example, given a decompressed log file of 2.1 megabytes whichexceeds a 2 megabyte threshold, the log file monitor 146 determines that0.3 megabytes of data should be removed to achieve a target log filesize of 1.8 megabytes, leaving a 10% buffer. The log file monitor 146calculates a percentage reduction in the overall file size to achievethe target size. To continue the example, the log file monitor 146calculates that 1−(Target Size(1.8)/Current Size(2.1))=1−˜0.85=0.15, ora 15% reduction in size. The log file monitor 146 then multiplies atotal number of log entries in the log file by the reduction ratio todetermine a number of log entries to remove. For example, if the 2.1megabyte log file has 210,000 log entries, the log file monitor 146calculates 210,000*0.15 to determine that 31,500 log entries should beremoved from the log file to achieve the 1.8 megabyte target size. Thelog file monitor 146 removes the determined number of oldest log entriesfrom the log file, recompresses the log file, and again stores the logfile on the storage device 145.

At stage G, the reporting engine 150 generates the log report 151. Thereporting engine 150 allows for data collected from the network devices118 and stored in the log files to be viewed and analyzed. The reportingengine 150 can filter the logged data by IP address and IP domain,display a list of polled devices, allow for text searching of the loggeddata, etc. For example, in response to receiving the IP address192.168.1.2, the reporting engine 150 searches the storage device 145 toidentify the log file for the indicated IP address. The reporting engine150 then reads and decompresses the log file and adds entries 20-40 tothe log report 151. The reporting engine 150 also searches for andretrieve a log corresponding to the IP address 192.168.1.2 from thememory 135. The reporting engine 150 appends the log entries from thememory 135 to the entries retrieved from the storage device 145resulting in entries 20-58 in the log report 151. Based on indicatedparameters, the reporting engine 150 may further process or filter thelog entries in the log report 151. For example, the reporting engine 150may only include entries that indicate an SNMP polling error.

In some implementations, the log file monitor 146 may not decompress logfiles prior to truncation as described above at stage F. Instead, thelog file monitor 146 may simply remove older compressed chunks ofoffloaded entries from the log file. For example, if a log filecomprises 6 compressed chunks of log entries, the log file monitor 146may remove the 2 oldest compressed chunks from the log file to place thelog file in compliance with a size threshold.

FIG. 1 is annotated with a series of letters A-G. These lettersrepresent stages of operations. Although these stages are ordered forthis example, the stages illustrate one example to aid in understandingthis disclosure and should not be used to limit the claims. Subjectmatter falling within the scope of the claims can vary with respect tothe order and some of the operations.

FIG. 2 depicts an example system for assignment of logs to instantiatedlog monitors. FIG. 2 depicts a memory 235 and a log monitor manager 240that manages a log monitor 0 241, a log monitor 1 242, and a log monitor2 243. The memory 235 includes logs for 6 IP addresses associated withdevices in a network. The logs include data collected from the devices.

FIG. 2 depicts the example system at two different points in time: atime 1 and a time 2. At time 1, the log monitor manager 240 hasinstantiated the log monitor 0 241 and the log monitor 1 242 to monitorthe 6 logs in the memory 235. The log monitor manager 240 uses thedepicted formula (Function(IP) % 2) to assign and load balance the logsacross the two log monitors. The formula uses the IP address of a log asan argument to a function and takes the modulus 2 of the result of thefunction to determine which of the two log monitors will be assigned thelog. The function may process the IP address in a number of ways. Forexample, the function may hash the IP address using hash techniques suchas the Secure Hash Algorithm or MD5. In FIG. 2, the function convertsthe IP address into a number so that a modulus 2 of the IP address maybe determined. For example, the IP address 192.168.1.1 becomes thenumber 19,216,811. The modulus 2 of 19,216,811 is 1 which means that thelog for the IP address 192.168.1.1 is assign to the log monitor 1 242.The log monitor manager 240 continues applying the formula to the restof the logs to determine assignments. For example, the IP address192.168.1.2 becomes the number 19,216,812 which has a modulus 2 of 0, sothe log for the IP address 192.168.1.2 is assigned to the log monitor 0241. The function may also perform other manipulations to the IPaddress, such as applying mathematical operations, rounding the IPaddress, truncating the IP address, etc.

At time 2 (depicted below the dashed line), a log for the IP address192.168.1.7 has been added to the memory 235. A data collector may havecreated the log in response to a new device being added to the network.The log monitor manager 240 detects the addition of the new log anddetermines that an additional log monitor should be instantiated. Thedetermination to instantiate an additional log monitor may be based onvarious criteria. For example, the log monitor manager 240 may beconfigured to assign no more than 3 logs to each log monitor. As anadditional example, the log monitor manager 240 may be programmed tomaintain performance criteria such as x number of log offloads persecond and determine that an additional log monitor is needed to satisfythe performance criteria. Based on determining that an additional logmonitor is needed, the log monitor manager 240 instantiates the logmonitor 2 243 in addition to the log monitor 0 241 and the log monitor 1242. The log monitor manager 140 then redistributes the log assignmentsover the log monitors. Since there are now three log monitors, theformula is updated to take a modulus 3 of the Function(IP) result sothat three outcomes are possible: 0, 1, and 2. The logs are thenassigned to the log monitors based on the updated formula.

FIG. 3 is a flowchart of example operations for managing log monitors.The description in FIG. 3 refers to a log monitor manager performing theexample operations for naming consistency with FIG. 1, although namingof program code can vary among implementations.

A log monitor manager (“manager”) identifies a plurality of logs inmemory to be monitored (302). The manager may periodically scan thememory to identify header information or metadata that indicates thebeginning of a log. In some implementations, a data collector manager,such as the data collector manager 130 described in FIG. 1, and anycorresponding data collectors may maintain a table in memory thatindicates the memory address spaces that have been allocated for logs.When memory space is allocated for a new log, the data collector manageror the responsible data collector updates the shared table with anadditional entry and may notify the log monitor manager of the update.Alternatively, the manager may periodically scan the table for currentlogs and changes to the shared table. The manager also retrieves anidentifier for each of the plurality of logs. The identifier may be anIP address associated with data indicated in the log or may be anidentifier assigned by a data collector.

The manager determines a number of log monitors to instantiate based ona set of criteria (304). The set of criteria may include configured logassignment thresholds, desired performance metrics, available resources,etc. For example, if the manager is configured to assign no more than 5logs to each log monitor, the manager divides the number of logs in theplurality of logs by 5 to determine the number of log monitors toinstantiate. For performance metrics, the manager may be configured toinstantiate enough log monitors to provide a throughput of x number oflogs offloaded per second. The manager can determine how many logmonitors are needed by measuring a time required for a log monitor tooffload a log to storage. For example, if the desired throughput is 1log offloaded per second and a log monitor requires 2 seconds to offloada log, the manager needs at least two log monitors to be offloading logsso that the desired throughput can be achieved. In some instances, themanager may be constrained by available resources, such as processingpower, memory, available processor time, processor threads, etc. Inthese instances, the manager may determine to instantiate as many logmonitors as allowed by the available resources and may attempt todynamically balance resources as needed. For example, if a systemexecuting the log monitors requests additional memory, the manager mayinstantiate additional log monitors to offload log entries more quickly,thereby freeing up additional memory. Alternatively, if the systemrequests additional processor resources, the manager may reduce a numberof instantiated log monitors, thereby freeing up processor resources.The manager may subscribe to an performance monitor of the systemexecuting the log monitors to receive metrics and alerts related toavailable resources.

The manager instantiates the determined number of log monitors (306). Alog monitor is a macro, script, container, application, or othersoftware process that can be invoked or triggered by the manager. Forexample, if a log monitor is a process that runs within a container, themanager duplicates and begins running containers equal to the determinednumber of log monitors. As an additional example, if the log monitor isa script, the manager begins executing multiple instantiations of thescript and may assign each script its own processor core or processorthread.

The manager assigns the plurality of logs to be monitored across theinstantiated log monitors (308). The manager may configure each logmonitor during instantiation to monitor specified logs or may maintain atable of log assignments which is shared with the log monitors who thendetermine their assignments from the table. The manager may use varioustechniques to assign the logs. The manager may manually distribute logsacross the log monitors or may use a formula to determine logassignments as described in FIG. 2. In some implementations, the managermay analyze data collection rates for each of the logs and use the ratesto load balance log assignments across the log monitors. The datacollectors may be configured to collect data from network devices atdifferent rates. For example, a data collector may collect data from afirst device once per minute and collect data from a second device fourtimes per minute. As a result, the log for the second device willpopulate more quickly and require a log monitor to more frequentlyoffload the log to storage. The manager retrieves the data collectionrates from the data collector manager and then determines a distributionof the logs that effectively load balances offloading operations acrossthe log monitors. To continue the example above, a first log monitor maybe solely assigned the log for the second device which collects data 4times per minute while a second log monitor may be assigned the log forthe first device which collects data 1 time per minute as well as otherlogs with low data collection rates. In some instances, if a log has ahigh data collection rate, the manager may assign two or more logmonitors to handle offloading operations for the single log.

After assigning the plurality of logs to the instantiated log monitors,the manager begins operations for managing operation and performance ofthe log monitors (310 and 312). These operations, depicted inside thedashed line box of FIG. 3, occur in parallel and continue throughoutmanagement of the log monitors. The manager monitors a number of theplurality of logs in memory and determines whether the number of logs inmemory has changed (310). The manager may monitor the logs in memory ormonitor indications of logs in a table maintained by a data collectormanager.

Additionally, the manager determines whether the log monitors aresatisfying performance criteria (312). As described above, the managermay be configured to maintain various performance criteria, such as athreshold throughput of logs offloaded per minute. If the log monitorsare not achieving the threshold throughput, the manager determines thatthe performance criteria is not being satisfied. As an additionalexample, the manager may be configured to ensure that that logs inmemory do not exceed a specified size (e.g., 2 megabytes). The managermonitors the sizes of the plurality of logs to determine whether thisperformance criteria is being satisfied. If logs are frequently nearingor encroaching on the threshold size, the manager determines that theperformance criteria is not being satisfied.

If the number of logs has changed (310) or if the log monitors are notsatisfying performance criteria (312), the manager changes the number ofinstantiated log monitors (314). If the number of logs increases, themanager determines that there is an unassigned log. The manager thendetermines whether there is an available log monitor which can handlethe additional load of the unassigned log. If there is not an availablelog monitor to handle the unassigned log, the manager instantiates anadditional log monitor. If the number of logs decreases, the managerdetermines whether the number of log monitors can be reduced and removesany excess log monitors. In instances where performance criteria is notbeing satisfied, the manager instantiates additional log monitors. Thenumber of additional log monitors instantiated can change based on adegree to which the performance criteria was not being satisfied. Forexample, if the log monitors are operating at 10% below performancerequirements, the manager may only instantiate one additional logmonitor; whereas, if the log monitors are underperforming by 50%, thelog manager may instantiate 5 additional log monitors. In someinstances, the manager may also receive new or different performancecriteria which triggers the instantiation of additional log monitors ora decrease in the number of log monitors. For example, a new performancecriteria may require the manager to utilize fewer resources and,therefore, decrease the number of log monitors.

The manager reassigns the plurality of logs across the instantiated logmonitors (316). The manager assigns logs in a manner similar to thatdescribed at block 308. If an additional log monitor was instantiatedfor an unassigned log, the manager may simply assign the unassigned logto the new log monitor without changing the existing log assignments.After reassignment of the logs, the manager continues operations formanaging operation and performance of the log monitors (310 and 312).

The operations of blocks 314 and 316 may be iterative. For example, themanager may instantiate an additional log monitor, reassign theplurality of logs, and evaluate the performance of the log monitors withthe additional log monitor. If the performance is still insufficient,the manager may instantiate a second additional log monitor, reassignthe plurality of logs, evaluate the performance of the log monitors withthe second additional log monitor, and so on.

FIG. 4 is a flowchart of example operations for offloading logs inmemory to a storage device. The description in FIG. 4 refers to a logmonitor performing the example operations for naming consistency withFIG. 1, although naming of program code can vary among implementations.

A log monitor receives assignment of a set of logs to monitor in memory(402). The log monitor may receive the assignment from a log monitormanager or other process that assigns logs stored in memory of a datacollection system. The log monitor may receive identifiers for each ofthe logs and may search the memory to determine a location of each ofthe logs. Alternatively, the log monitor may receive a memory address ora pointer to a head of each of the logs. In some implementations, thelog monitor may check a table in memory or other storage that indicateslog assignments and retrieve its assignments from the table.

The log monitor begins offloading operations for each log in the set oflogs (404). The log for which operations are currently being performedis hereinafter referred to as “the selected log.”

The log monitor determines whether a trigger for offloading the selectedlog has been detected (406). The log monitor may offload the selectedlog periodically, when the log has reached a specified size, or asrequested by another service such as a reporting engine. For example,the log monitor may be programmed to offload each log every 2 minutes.If two minutes have passed since offloading the selected log, the logmonitor determines that the selected log should again be offloaded. Asan additional example, the log monitor may periodically check a size ofthe selected log. If the size of the selected log exceeds a specifiedthreshold, the log monitor determines that the selected log should beoffloaded.

If a trigger for offloading the selected log has been detected, the logmonitor removes entries in the selected log from memory (408). The logmonitor reads some or all of the log entries from memory and then clearsthe space occupied by the logs. The log monitor may be programmed tooffload a specified number of log entries at a time. In such instances,the log monitor offloads the specified number of the oldest log entriesfrom the log. The log monitor may clear the memory space occupied by thelogs by changing header or metadata information which indicates a numberof entries in the log, by resetting a pointer of a linked list or bufferto the first entry, etc.

The log monitor compresses the removed log entries (410). The logmonitor uses compression techniques such as gzip, Roshal Archive (RAR),zip, etc., to compress the removed log entries.

The log monitor appends the compressed log entries to a correspondinglog file in persistent storage (412). The log monitor may use anidentifier for the log to locate a log file in the storage. If a logfile does not exist, the log monitor creates a new log file and names orassociates the log file with the identifier for the log. The log monitoraccesses the log file and writes the compressed log entries to the endof the log file. The log monitor may also update metadata associatedwith the log file that indicates a total number of compressed log entrychunks, a number of entries included in each chunk, a total number oflog entries, etc.

If a trigger for offloading the selected log has not been detected (406)or after appending the compressed log entries to a log file (412), thelog monitor determines whether there is an additional log in the set oflogs (414). If there is an additional log, the log monitor selects thenext log in the set of logs.

If there is not an additional log, the log monitor determines whetherupdated log assignments have been issued (416). The log monitor maycheck if new or additional log assignments have been received from a logmonitor manager or determine whether log assignments indicated in atable have changed. The table of log assignments may include a flag at aspecified memory location that is set if log assignments have changed.The log monitor may monitor the flag and retrieve updated assignmentswhen the flag has been set. If updated assignments have been issued, thelog monitor receives the updated assignment of a set of logs to monitorin memory (402) and begins monitoring operations for the newly assignedlogs (404). If updated assignments have not been issued, the log monitorcontinues offloading operations for the currently assigned set of logs(404).

FIG. 5 is a flowchart of example operations for monitoring log files inpersistent storage. The description in FIG. 5 refers to a log filemonitor performing the example operations for naming consistency withFIG. 1, although naming of program code can vary among implementations.

A log file monitor identifies a plurality of log files in storage to bemonitored (502). The log file monitor may be assigned or detect a volumeor other storage location that includes the log files to which log datais being offload by a plurality of log monitors. The log file monitormay analyze the plurality of log files to collect metadata formonitoring purposes, determine storage addresses, etc.

The log file monitor begins monitoring operations for each log file inthe plurality of log files (504). The log file for which operations arecurrently being performed is hereinafter referred to as “the selectedlog file.”

The log file monitor determines whether a size of the selected log fileis greater than a threshold size (506). The threshold size is themaximum amount of storage space for a log file. The log file monitor mayidentify the threshold size in configuration information or maydetermine the threshold size based on a number of log files and anamount of storage space. For example, if there are 10 log files and 25gigabytes of storage space, the log file monitor may determine that thethreshold size for each log file is 2.5 gigabytes, or the log monitormay determine that the threshold size is 2 gigabytes and leave 5gigabytes of storage space as a buffer. The log file monitor maydetermine the size of the selected log file from metadata information orfrom file system data and compare the size of the selected log file tothe threshold size.

If the log file monitor determines that the size of the selected logfile is greater than the threshold size, the log file monitor locks theselected log file and reads the selected log file from storage intomemory (508). The log file monitor locks the selected log file toprevent additional log entries or data being written to the log file bya log monitor while the log file monitor is accessing the log file. Thelog file monitor may lock the selected log file by changing thepermissions to read only.

The log file monitor decompresses the selected log file (510). Theselected log file comprises a number of compressed chunks of logentries. The log file monitor identifies a compression technique used tocompress the chunks and then decompresses each of the chunks using theidentified compression technique.

The log file monitor estimates an amount of data to remove from theselected log file (512). Since the selected log file will berecompressed prior to storage, the log file monitor is unable todetermine an exact amount of size reduction for the decompressedselected log file which will result in the selected log file being lessthan the threshold size once recompressed. As a result, the log filemonitor makes an estimate based on a set of criteria. The criteria caninclude a size of the compressed log file, a size of the decompressedlog file, the compression technique utilized, an amount with which thesize of the selected log file exceeds the threshold, etc. In someinstances, the log file monitor may use a formula based on a typicalcompression ratio for each compression technique. For example, if afirst compression technique has a typical compression ratio of 2:1 (i.e.2 bytes of data compress to 1 byte of data), the log file monitor mayuse a formula that doubles the amount of compressed data to remove andremoves the doubled amount from the decompressed data, e.g. if thecompressed log file exceeds the threshold by 1 megabyte, the log filemonitor doubles that amount and therefore removes 2 megabytes from thedecompressed log file.

The log file monitor removes a number of entries from the selected logfile equal to the amount of data to remove (514). The log file monitoridentifies a number of the oldest entries in the selected log file thatcollectively equal or approximately equal the amount of data to beremoved. The log file monitor deletes the entries from the selected logfile and updates any header information or metadata accordingly.

The log file monitor recompresses and replaces the selected log file instorage (516). The log file monitor recompresses the selected log filewhich has had the number of entries removed. The log file monitor maythen delete the old version of the selected log file from storage oroverwrite the old version of the selected log file in storage with thenew, smaller version of the selected log file. In some implementations,the log file monitor may ensure that the new selected log file is belowthe threshold size after recompression, and if the log file is not belowthe threshold, the log file monitor may decompress and remove additionalentries from the selected log file prior to storage.

After recompressing and replacing the selected log file in storage (516)or if the log file monitor determines that the size of the selected logfile is not greater than the threshold size (508), the log file monitordetermines whether there is an additional log file (518). If there is anadditional log file, the log file monitor selects the next log file(504). If there is not an additional log file, the process ends.

Variations

The flowcharts are provided to aid in understanding the illustrationsand are not to be used to limit scope of the claims. The flowchartsdepict example operations that can vary within the scope of the claims.Additional operations may be performed; fewer operations may beperformed; the operations may be performed in parallel; and theoperations may be performed in a different order. For example, theoperations depicted in blocks 306 and 308 of FIG. 3 can be performed inparallel or concurrently. Additionally, the operation depicted in block416 of FIG. 4 may not be performed. It will be understood that eachblock of the flowchart illustrations and/or block diagrams, andcombinations of blocks in the flowchart illustrations and/or blockdiagrams, can be implemented by program code. The program code may beprovided to a processor of a general purpose computer, special purposecomputer, or other programmable machine or apparatus.

Some operations above iterate through sets of items, such as logs inmemory or log files (“logs”). In some implementations, logs may beiterated over according to an ordering of logs, an indication of logimportance, a timestamp associated with each log, a device typeassociated with each log, a size of each log, etc. Also, the number ofiterations for loop operations may vary. Different techniques forprocessing logs may require fewer iterations or more iterations. Forexample, multiple logs may be offloaded from memory in parallel.Similarly, multiple log files may be truncated to comply with storagethresholds in parallel.

The examples often refer to a data collection manager and a log monitormanager. The term manager is a construct used to refer to implementationof functionality for instantiating, controlling, and monitoring acollection of agents or software processes. This construct is utilizedsince numerous implementations are possible. A manager may be ahypervisor with additional program code, an application, a particularcomponent or components of a machine (e.g., a particular circuit cardenclosed in a housing with other circuit cards/boards),machine-executable program or programs, firmware, a circuit card withcircuitry configured and programmed with firmware for instantiationmonitor and collector software, etc. The term is used to efficientlyexplain content of the disclosure. Although the examples refer tooperations being performed by managers, different entities can performdifferent operations. For instance, a dedicated co-processor orapplication specific integrated circuit can instantiate, control, andmonitor a collection of agents or software processes.

As will be appreciated, aspects of the disclosure may be embodied as asystem, method or program code/instructions stored in one or moremachine-readable media. Accordingly, aspects may take the form ofhardware, software (including firmware, resident software, micro-code,etc.), or a combination of software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”The functionality presented as individual modules/units in the exampleillustrations can be organized differently in accordance with any one ofplatform (operating system and/or hardware), application ecosystem,interfaces, programmer preferences, programming language, administratorpreferences, etc.

Any combination of one or more machine readable medium(s) may beutilized. The machine readable medium may be a machine readable signalmedium or a machine readable storage medium. A machine readable storagemedium may be, for example, but not limited to, a system, apparatus, ordevice, that employs any one of or combination of electronic, magnetic,optical, electromagnetic, infrared, or semiconductor technology to storeprogram code. More specific examples (a non-exhaustive list) of themachine readable storage medium would include the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a portable compact disc read-only memory (CD-ROM), anoptical storage device, a magnetic storage device, or any suitablecombination of the foregoing. In the context of this document, a machinereadable storage medium may be any tangible medium that can contain, orstore a program for use by or in connection with an instructionexecution system, apparatus, or device. A machine readable storagemedium is not a machine readable signal medium.

A machine readable signal medium may include a propagated data signalwith machine readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Amachine readable signal medium may be any machine readable medium thatis not a machine readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a machine readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thedisclosure may be written in any combination of one or more programminglanguages, including an object oriented programming language such as theJava® programming language, C++ or the like; a dynamic programminglanguage such as Python; a scripting language such as Perl programminglanguage or PowerShell script language; and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on astand-alone machine, may execute in a distributed manner across multiplemachines, and may execute on one machine while providing results and oraccepting input on another machine.

The program code/instructions may also be stored in a machine readablemedium that can direct a machine to function in a particular manner,such that the instructions stored in the machine readable medium producean article of manufacture including instructions which implement thefunction/act specified in the flowchart and/or block diagram block orblocks.

FIG. 6 depicts an example computer system with a scalable data loggingapplication. The computer system includes a processor unit 601 (possiblyincluding multiple processors, multiple cores, multiple nodes, and/orimplementing multi-threading, etc.). The computer system includes memory607. The memory 607 may be system memory (e.g., one or more of cache,SRAM, DRAM, zero capacitor RAM, Twin Transistor RAM, eDRAM, EDO RAM, DDRRAM, EEPROM, NRAM, RRAM, SONOS, PRAM, etc.) or any one or more of theabove already described possible realizations of machine-readable media.The computer system also includes a bus 603 (e.g., PCI, ISA,PCI-Express, HyperTransport® bus, InfiniBand® bus, NuBus, etc.) and anetwork interface 605 (e.g., a Fiber Channel interface, an Ethernetinterface, an internet small computer system interface, SONET interface,wireless interface, etc.). The system also includes a scalable datalogging application 611. The scalable data logging application 611scales operations for offloading logs from memory to storage based on anumber of logs and performance criteria. Any one of the previouslydescribed functionalities may be partially (or entirely) implemented inhardware and/or on the processor unit 601. For example, thefunctionality may be implemented with an application specific integratedcircuit, in logic implemented in the processor unit 601, in aco-processor on a peripheral device or card, etc. Further, realizationsmay include fewer or additional components not illustrated in FIG. 6(e.g., video cards, audio cards, additional network interfaces,peripheral devices, etc.). The processor unit 601 and the networkinterface 605 are coupled to the bus 603. Although illustrated as beingcoupled to the bus 603, the memory 607 may be coupled to the processorunit 601.

While the aspects of the disclosure are described with reference tovarious implementations and exploitations, it will be understood thatthese aspects are illustrative and that the scope of the claims is notlimited to them. In general, techniques for scalable log offloadingoperations as described herein may be implemented with facilitiesconsistent with any hardware system or hardware systems. The variationsdescribed above do not encompass all possible variations,implementations, or embodiments of the present disclosure. Manyvariations, modifications, additions, and improvements are possible.

Plural instances may be provided for components, operations orstructures described herein as a single instance. Finally, boundariesbetween various components, operations and data stores are somewhatarbitrary, and particular operations are illustrated in the context ofspecific illustrative configurations. Other allocations of functionalityare envisioned and may fall within the scope of the disclosure. Ingeneral, structures and functionality presented as separate componentsin the example configurations may be implemented as a combined structureor component. Similarly, structures and functionality presented as asingle component may be implemented as separate components. These andother variations, modifications, additions, and improvements may fallwithin the scope of the disclosure.

Use of the phrase “at least one of” preceding a list with theconjunction “and” should not be treated as an exclusive list and shouldnot be construed as a list of categories with one item from eachcategory, unless specifically stated otherwise. A clause that recites“at least one of A, B, and C” can be infringed with only one of thelisted items, multiple of the listed items, and one or more of the itemsin the list and another item not listed.

What is claimed is:
 1. A method comprising: detecting a plurality oflogs in memory of a first system, wherein each log of the plurality oflogs comprises data collected from a network device; instantiating afirst monitor and a second monitor; distributing assignments foroffloading the plurality of logs from the memory to a storage deviceacross the first monitor and the second monitor; and based ondetermining that the first monitor and the second monitor do not satisfyperformance criteria, instantiating a third monitor; and redistributingassignments for offloading the plurality of logs across the firstmonitor, the second monitor, and the third monitor.
 2. The method ofclaim 1 further comprising: offloading, by the first monitor, a firstlog of the plurality of logs from the memory of the first system to thestorage device, wherein offloading the first log comprises, detecting atrigger for offloading the first log; and based on detecting thetrigger, removing the first log from the memory; compressing the firstlog; and appending the compressed first log to a log file correspondingto the first log in the storage device.
 3. The method of claim 2,wherein detecting the trigger for offloading the first log comprises atleast one of: determining that the first log has exceeded a thresholdsize; receiving a request to offload the first log; completingoffloading of a second log of the plurality of logs; and determiningthat a period of time has elapsed since the first log was previouslyoffloaded.
 4. The method of claim 1, wherein determining that the firstmonitor and the second monitor satisfy the performance criteriacomprises at least one of: determining that the first monitor and thesecond monitor are consuming more than an allotted amount of resourceson the first system; determining that the first monitor and the secondmonitor are not offloading a threshold number of logs within a timeperiod; and determining that the first monitor and the second monitorare not maintaining each of the plurality of logs below a thresholdsize.
 5. The method of claim 1, wherein distributing the assignments foroffloading the plurality of logs across the first monitor and the secondmonitor comprises: for each of plurality of logs, determining anidentifier for the log; and determining whether to assign the log to thefirst monitor or the second monitor based, at least in part, onevaluating a function that uses the identifier as an argument.
 6. Themethod of claim 1, wherein distributing the assignments for offloadingthe plurality of logs across the first monitor and the second monitorcomprises: determining a data collection rate associated with each ofplurality of logs; and distributing the assignments for offloading theplurality of logs across the first monitor and the second monitor based,at least in part, on an analysis of the data collection rates.
 7. Themethod of claim 1 further comprising: monitoring a log file in thestorage device, wherein the log file corresponds to a first of theplurality of logs and comprises compressed data offloaded from the firstlog; and based on determining that the log file has exceeded a thresholdsize, locking the log file on the storage device; reading anddecompressing the log file; removing an estimated amount of data fromthe decompressed log file, wherein the estimated amount of data isdetermined based, at least in part, on a set of criteria; andrecompressing and storing the log file on the storage device.
 8. Themethod of claim 7, wherein the set of criteria comprises at least one ofa type of compression technique used to compress data in the log file,an amount with which the log file exceeds the threshold size, adecompressed size of the log file, and a configured amount of bufferspace to be allotted for new log data.
 9. The method of claim 7 furthercomprising: determining a number of instances that the log file hasexceeded the threshold size over a period of time; and based ondetermining that the number of instances exceeds a threshold, increasingthe estimated amount of data to remove from the decompressed log file.10. One or more non-transitory machine-readable media comprising programcode for managing a scalable logging system, the program code to: detecta plurality of logs in memory of a first system, wherein each log of theplurality of logs comprises data collected from a network device;instantiate a first monitor and a second monitor; distribute assignmentsfor offloading the plurality of logs from the memory to a storage deviceacross the first monitor and the second monitor; and based on adetermination that the first monitor and the second monitor do notsatisfy performance criteria, instantiate a third monitor; andredistribute assignments for offloading the plurality of logs across thefirst monitor, the second monitor, and the third monitor.
 11. Themachine-readable media of claim 10 further comprising program code to:offload, by the first monitor, a first log of the plurality of logs fromthe memory of the first system to the storage device, wherein theprogram code to offload the first log comprises program code to, detecta trigger for offloading the first log; and based on detection of thetrigger, remove the first log from the memory; compress the first log;and append the compressed first log to a log file corresponding to thefirst log in the storage device.
 12. An apparatus comprising: aprocessor; and a machine-readable medium having program code executableby the processor to cause the apparatus to, detect a plurality of logsin memory of a first system, wherein each log of the plurality of logscomprises data collected from a network device; instantiate a firstmonitor and a second monitor; distribute assignments for offloading theplurality of logs from the memory to a storage device across the firstmonitor and the second monitor; and based on a determination that thefirst monitor and the second monitor do not satisfy performancecriteria, instantiate a third monitor; and redistribute assignments foroffloading the plurality of logs across the first monitor, the secondmonitor, and the third monitor.
 13. The apparatus of claim 12, furthercomprising program code executable by the processor to cause theapparatus to: offload, by the first monitor, a first log of theplurality of logs from the memory of the first system to the storagedevice, wherein the program code executable by the processor to causethe apparatus to offload the first log comprises program code executableby the processor to cause the apparatus to, detect a trigger foroffloading the first log; and based on detection of the trigger, removethe first log from the memory; compress the first log; and append thecompressed first log to a log file corresponding to the first log in thestorage device.
 14. The apparatus of claim 13, wherein the program codeexecutable by the processor to cause the apparatus to detect the triggerfor offloading the first log comprises program code executable by theprocessor to cause the apparatus to at least one of: determine that thefirst log has exceeded a threshold size; receive a request to offloadthe first log; complete offloading of a second log of the plurality oflogs; and determine that a period of time has elapsed since the firstlog was previously offloaded.
 15. The apparatus of claim 12, wherein theprogram code executable by the processor to cause the apparatus todetermine that the first monitor and the second monitor satisfy theperformance criteria comprises program code executable by the processorto cause the apparatus to at least one of: determine that the firstmonitor and the second monitor are consuming more than an allottedamount of resources on the first system; determine that the firstmonitor and the second monitor are not offloading a threshold number oflogs within a time period; and determine that the first monitor and thesecond monitor are not maintaining each of the plurality of logs below athreshold size.
 16. The apparatus of claim 12, wherein the program codeexecutable by the processor to cause the apparatus to distribute theassignments for offloading the plurality of logs across the firstmonitor and the second monitor comprises program code executable by theprocessor to cause the apparatus to: for each of plurality of logs,determine an identifier for the log; and determine whether to assign thelog to the first monitor or the second monitor based, at least in part,on evaluating a function that uses the identifier as an argument. 17.The apparatus of claim 12, wherein the program code executable by theprocessor to cause the apparatus to distribute the assignments foroffloading the plurality of logs across the first monitor and the secondmonitor comprises program code executable by the processor to cause theapparatus to: determine a data collection rate associated with each ofplurality of logs; and distribute the assignments for offloading theplurality of logs across the first monitor and the second monitor based,at least in part, on an analysis of the data collection rates.
 18. Theapparatus of claim 12 further comprising program code executable by theprocessor to cause the apparatus to: monitor a log file in the storagedevice, wherein the log file corresponds to a first of the plurality oflogs and comprises compressed data offloaded from the first log; andbased on a determination that the log file has exceeded a thresholdsize, lock the log file on the storage device; read and decompress thelog file; remove an estimated amount of data from the decompressed logfile, wherein the estimated amount of data is determined based, at leastin part, on a set of criteria; and recompress and store the log file onthe storage device.
 19. The apparatus of claim 18, wherein the set ofcriteria comprises at least one of a type of compression technique usedto compress data in the log file, an amount with which the log fileexceeds the threshold size, a decompressed size of the log file, and aconfigured amount of buffer space to be allotted for new log data. 20.The apparatus of claim 18 further comprising program code executable bythe processor to cause the apparatus to: determine a number of instancesthat the log file has exceeded the threshold size over a period of time;and based on a determination that the number of instances exceeds athreshold, increase the estimated amount of data to remove from thedecompressed log file.